Programming Massively Parallel Processors: A Hands-on Approach: The Genesis of GPU Computing

The birth of the GPU was a radical departure driven by the "real-time imperative": the non-negotiable requirement to render complex 3D scenes within a $1/60^{th}$ second (16.67ms) window. While CPUs followed a multicore trajectory optimized for low-latency serial execution, they hit a wall as resolutions increased.

1. The 16.67ms Constraint

In the mid-90s, gaming reached a crisis. A serial CPU, handling AI and physics, couldn't calculate millions of pixel values fast enough to maintain fluid motion. This forced the creation of dedicated hardware to offload the repetitive graphics pipeline.

2. Scan Line Interleave (SLI)

Before internal parallel arrays, 3dfx introduced Scan Line Interleave (SLI). By using two physical cards to compute alternating horizontal lines, the industry shifted its focus from single-thread speed to raw "brute force" throughput.

3. Throughput vs. Latency

The GPU genesis prioritized silcon area for simple arithmetic units rather than complex branch prediction. This "wide and slow" philosophy allowed GPUs to handle the repetitive math of triangles while the CPU focused on non-parallel logic.

TERMINAL bash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the specific 'time budget' required for 60 frames per second (FPS)?

33.33ms

16.67ms

10.00ms

100.00ms

QUESTION 2

How did 3dfx's SLI achieve early parallelism in consumer hardware?

By increasing the clock speed of a single chip.

By having two cards render alternating horizontal scan lines.

By sharing AI logic between the GPU and CPU.

By reducing the resolution of the frame.

QUESTION 3

Why did the GPU diverge from the standard multicore trajectory of CPUs?

GPUs needed deeper caches for complex branching.

GPUs prioritize throughput of simple math over low-latency serial logic.

CPUs became too expensive to manufacture for 3D graphics.

GPU architectures were designed to be smaller than CPUs.

QUESTION 4

In the context of 1990s gaming, what was the 'Real-Time Imperative'?

The requirement to run physics simulations on the GPU.

Processing millions of pixels within the strict frame window.

The transition from 16-bit to 32-bit computing.

Allowing the CPU to handle rasterization.

QUESTION 5

What is meant by the GPU's 'Wide and Slow' philosophy?

Using many simple processors at lower clock speeds to do massive work.

Designing physically wide chips that take longer to process data.

A design that favors high latency but high memory capacity.

Optimizing for single-threaded serial logic.